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Abstract 

Both statistics and quantum theory deal with prediction using proba- 
bility. We will show that there can be established a connection between 
these two areas. This will at the same time suggest a new, less formalistic 
way of looking upon basic quantum theory. 

A total parameter space $, equipped with a group G of transforma- 
tions, gives the mental image of some quantum system, in such a way 
that only certain components, functions of the total parameter cj> can be 
estimated. Choose an experiment/ question a, and get from this a pa- 
rameter space A a , perhaps after some model reduction compatible with 
the group structure. As in statistics, it is important always to distinguish 
between observations and parameters, in particular between (minimal) 
observations t a and state variables (parameters) A a . Let K a be the L 2 
-space of functions of t a , and let H a be the image of K° under the ex- 
pectation operator of the model. The measure determining the L 2 -spaces 
is the invariant measure under the maximal subgroup which induces a 
transformation on A°. 

There is a unitary connection between K a and H a , and then under 
natural conditions between H a and H 6 for different a and b. Thus there 
exists a common Hilbert space H such that H a equals U a H for a unitary 
U a . In agreement with the common formulation of quantum mechanics, 
the vectors of H are taken to represent the states of the system. The state 
interpretation is then: An ideal experiment E a has been performed, and 
the result of this experiment, disregarding measurement errors, is that 
the parameter X a is equal to some fixed number A£. This essentially sta- 
tistical construction leads under natural assumptions to the basic axioms 
of quantum mechanics, and thus implies a new statistical interpretation 
of this traditionally very formal theory. The probabilities are introduced 
via Born's formula, and this formula is proved from general, reasonable 
assumptions, essentially symmetry assumptions. 

The theory is illustrated by a simple macroscopic example, and by the 
example of a spin i particle. As a last example we show a connection to 
inference between related macroscropic experiments under symmetry. 
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1 Introduction 



Nancy Cartwright 0] has argued that physical laws are about our models of 
reality, not about reality itself. At the same time she argues (p. 186) that 
the interpretation of quantum mechanics should be seen entirely in terms of 
transition probabilities. The present paper is in agreement with both these 
statements. I will supplement the first statement, however, by saying a few 
words about models. For many reasonably complex phenomena, several models, 
and several ways to give a language for model formulation, can be imagined. 
In some instances these models, while appearing different, are so closely related 
that they give the same predictions about reality. If this is the case, I think 
many would agree with me that we should choose the model which has the most 
intuitive interpretation, also in cases where there may be given strong historical 
and culturally related arguments for other models. It may of course be the case 
that the conventional model is more suitable for calculations, but this should 
not preclude us from using more intuitive points of view when arguing about 
the model and when trying to understand more complex phenomena. 

In the present paper I try to offer such a completely different, in my view 
more intuitive, modelling approach to quantum mechanics, and I show that the 
ordinary quantum formalism, at least the time- independent aspects of it, follows 
from it. 

Another purpose of this paper is to argue for a logical connection between 
quantum theory and a natural extension of statistical theory, an extension where 
symmetry aspects are taken into account by defining group transformations on 
the state space (or parameter space in statistical terminology), and where com- 
plementarity may be introduced by defining a total parameter space to which 
several potential experiments may be coupled. These different experiments may 
each be assumed to be limited by a context which is such that only a part of 
the total parameter may be estimated. 

My aim is to arrive at ordinary quantum mechanics from this extended 
statistical theory. From this one may formulate the tentative conclusion that at 
least some of the differences between the two sciences, the way the theories are 
formulated today, may have some cultural origin. 

To start such a programme, we have to relate it to conventional quantum 
theory. The basic formalism of quantum theory is given in slightly different ways 
in different books, although all agree about the foundation. For definiteness I 
will consider the following three axioms, taken from Isham I n the last 

Section of the paper I will give a derivation of them (and in fact more) under 
suitable assumptions from the setting indicated above. 

Rule 1. 

The predictions of results of measurements in quantum mechanics made on 
an otherwise isolated system are probabilistic in nature. In situations where the 
maximum amount of information is available, this probabilistic information is 
represented mathematically by a vector in a complex Hilbert space H that forms 
the state space of the quantum theory. In so far as it gives the most precise 
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predictions that are possible, this vector is to be thought of as the mathematical 
representation of the physical notion of 'state' of the system. 

Rule 2. 

The observables of the system are represented mathematically by self-adjoint 
operators that act on the Hilbert space H. 

Rule 3. 

If an observable quantity A a is represented by the self-adjoint operator T a , 
and the state by the normalized vector v € H, then the expected result of the 
measurement is 

E v (X a ) = v^T a v. (1) 



These three rules together with the Schrodinger equation constitute accord- 
ing to the basic axioms of quantum mechanics. In this paper I will indicate 
the derivation of these rules from a setting which in essence is a natural exten- 
sion of statistical theory. I will also comment upon issues like superselection 
rules and the extension to mixed states, and try to argue that formal equations 
like can be associated with a very natural interpretation. A basic step in 
the derivation of the three rules is to give arguments for (a variant of) Born's 
celebrated formula for transition probabilities in quantum mechanics. 

Note that the states of quantum mechanics, in the non-degererate case, can 
be interpreted in the following way: Corresponding to the operator T there is a 
physical variable A, and a state, as given by a state vector Vk is connected to a 
particular value A& of A. Thus we can say theat the state is determined by two 
elements: 1) A question: What is the value of A? 2) The answer: A = A&. In 
this paper we show that we can go the other way, and start with a question/ 
answer pair like above, and then under certain assumptions, mainly related to 
symmetry and to a limited context, we arrive at the Hilbert space formulation 
with the above interpretation of the state vectors. 

Consider the spin 1/2 particle, and let a be any 3- vector. Then it makes 
sense to say: Question: What is the spin component in direction a? Answer: 
+ 1. And to say that these two elements together define a state, in ordinary 
quantummechanical language as the state with eigenvalue +1 for the operator 
a. a, where a is the vector composed of the three Pauli spin matrices. 

My aim is to approach this in a less formal way than the Hilbert space 
approach, the approach that we now have become so familiar with, but which 
seems to be impossible to explain in a simple way to people that have not 
obtained this familiarity. To carry out such a programme, I want to use other, 
more direct, mental models. 

The point is that I consider the spin vector in the model as what I call 
a total parameter (this name was suggested to me by Peter Jupp). A total 
parameter is to me something that can be formulated by ordinary language and 
is associated by a tentative model of the subatomic reality, but for which it is 
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far too optimistic to connect a definite value in the sense that this value can be 
estimated by an experiment. 

I will give examples of this below, and I will give more examples in the book 

What we can confront with experiments, though, is some given function of 
this spin vector, like a component of the spin vector in a certain direction, or 
more precisely, the sign of this component, say, in the spin 1/2 case. The rest 
of the spin vector will always remain unknown to us. One might of course say 
that then it is nonsense to speak about the rest of the spin vector, but I would 
say that it is useful to have a mental model. Various people may have variants 
of the same mental model, but this does not matter as long as they agree about 
the symmetry aspect and about the observable part of the spin vector. 

A relatively concrete realisation of this is given by the 'triangle in a sphere' 
which I describe below for which we are only allowed to look through one chosen 
window. Here we may have constructed the triangle with a given colour, made 
it from a definite material and so on, but to an observer in a window, all these 
parameters may only be imagined mentally. All that matters for his observations 
are the corners A, B or C, and all that matters in order to interprete these 
observations is a mental model of a triangle with rotational symmetry attached 
to it. 

Or consider the case of a single patient at a fixed time, where we might be 
interested in expected recovery times t\ and Ti under two potential treatments 
(and in other parameters), so that the vector r = (t%,T2) does not have an 
empirical value, but one component can be estimated by an experiment. 

But I repeat: The total parameter may nevertheless be a useful quantity. In 
the model context it may help us to just have a mental picture of what we think 
is going on, say, in the subatomic world. The spin vector can be red or blue, 
can be imagined to be connected to some solid body, or just be an arrow. But 
what is the same in all these mental pictures, is every single component that we 
are allowed to ask questions about. 

One very useful property of the total parameter is that we can imagine group 
transformations of it, and that these transformations then have consequences 
for the observable components. In the spin case and in the case with the triangle 
in the sphere, we can think of rotations. In the treatments of a patient-example 
it is meaningful to study scale invariance: (ti,T2) —> (6ti,&T2). 

So for such a total parameter I cannot ask every question I want, but I 
ask a question about a maximally observable component, and the answer is 
what I say defines a state. Even though I start with a mental picture involving 
unobservables, the state is defined in terms of an observable quantity. 

Return to the spin 1/2 case; there one can get an even more direct charac- 
terization: Let the question be about the spin in direction a, and let the answer 
be +1, then define as an abbreviation for this state, the 3- vector u — a. If 
the answer is -1, let the state be characterized by the 3-vector u = —a. This is 
consistent, since the latter state also can be result +1. Thus from my definition, 
the state can be characterized by a vector u. 

It is well known that in this case the state vectors v in the Hilbert space stand 
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(again disregarding an irrelevant phase factor) in one-to-one corresondence with 
the Bloch sphere operators 1/2(7 + u.a), which again are in one-to-one corre- 
spondence with the vectors u, vectors which can be defined as in the previous 
paragraph. Thus the state can be characterized either by the Hilbert space 
vector v, the 3-vector u or the Bloch sphere operator. 

Later I generalize this to mixed states and further to the effects of Busch 3 
and Caves et al. [5], used below to prove the Born formula. 

In the general Hilbert space every normalized vector is the eigenvector of 
some operator T, and if this corresponds to a meaningful physical variable, 
then you have a question and answer situation. In my approach there seem to 
be some uniqueness here under certain assumptions. 

One of my hopes has been to find a meaningful link between quantum theory 
and statistical theory, and my feeling is that we are close to that here. But I 
also feel that we then must be willing to change focus slightly in traditional 
theoretical statistics. 

In theoretical statistics there seems to be almost complete separation be- 
tween experimental design theory and inference theory. In practical experiments 
the two are linked closely together. In fact, a formulation of the experimental 
question should almost always be an important part of the conclusion, and in 
any useful investigation in biology and medicin, say, it alway is. Thus here also, 
in my opinion the conclusion should, in all good experiments, be stated as a 
question plus an answer. 

People that are close to applications, sometimes think in terms of mental 
quantities that are close to my total parameters, for instance Searle in his book 
P^] bases his treatment to a large extent on unestimable linear parameters, 
motivated by the fact that this gives a nicer mental picture when you want to 
consider several models at the same time. 

A more detailed derivation of the rules 1-3, together with more comments 
on several issues of quantum mechanics, also the Schrodinger equation, will be 
given in the forthcoming book UJ. There is also a paper ^U] under preparation, 
intended for the statistical community. 

2 The statistical background. 

Statistics is a tool in almost any science which collects and analyses empirical 
data. In statistical theory it is very important to distinguish between state 
variables on the one hand, and observations on the other hand. In statistics 
the state variables are usually called parameters. This distinction is unfortu- 
nately not common to draw in theoretical physics, but in quantum mechanical 
papers there is much discussion related to measurement apparata. When I now 
will formulate the standard statistical model, it can be useful to have such an 
apparatus in mind. I will return to this issue later. 

First, it is useful in our setting to stress that every experiment has as its 
purpose to answer one or several questions about nature. Call such an experi- 
ment E a , and let a denote the set of questions that is posed in this particular 
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experiment. The ideal answer to these questions are the values of our state vari- 
ables 9 a , or what statisticians call the parameter of experiment E a . This can 
for instance be the expected recovery time of patients with a particular disease 
from some given population that receive a certain treatment. The observations 
may then be the realized recovery times y = (j/i, . . . , y n ) for n randomly selected 
patients from this population. In statistical inference theory and practice such 
observations act as measurements, if you like, by some measurement apparatus, 
where the purpose is to make statements about the unknown reality represented 
by 9 a , in particular, to find an estimate 9 a from data. 

This conceptual framework has turned out to be useful in a large number 
of sciences. It is the purpose of this paper to argue that it can be of some use 
in the process of finding a non-formal foundation of quantum physics, too. The 
measurement apparata in physics can be perfect in the sense that they give 
the exact correct value, but nevertheless it is of value to distinguish between 
the observations, say positions of pointers or more generally some points in an 
Euclidean measure space, and the ideal values, here the parameters 9. 

For a general non-ideal measurement apparatus, our statistical model says 
that, given the state of nature 9 there there is a probability distribution Q e (dy) 
for the observations y. The precise form of this distribution will not be impor- 
tant for us here, but in macroscopic experiments it is very important, and forms 
the basis for statistical inference: How to use uncertain observations to make 
statements concerning the state of nature, for example about expected recovery 
times of patients under some treatment. 

One special concept from this setting is that of a sufficient observation: 
An observational function t(y) is called sufficient if the conditional distribution 
under the model of y given t is independent of the parameter 9. This means 
that all information about the unknown parameter is contained in the reduced 
observation t, which may be very useful in some settings. In the simplest cases, 
for instance, of the recovery time example above, the mean recovery time y = 
Si Vil n i s sufficient for the expected recovery time 9. 

A sufficient (reduced) observation t is called complete whenever the following 
holds: If h(t) has the property that the expectation under the model is for 
all 9, then h(t) is identically (almost surely under the model). Often, when a 
sufficient observation has been reduced as much as possible, it will be complete. 

These technical concepts are mentioned here because they are needed at 
a specific point of our argument below. Much more information about these 
concepts and related concepts can be found in advanced books in mathematical 
statistics like There are many intermediate book in mathematical statistics; 
one of them is [H] . 

We end the Section by discussing two extensions of the standard statistical 
model which are important in practice, and will be important in our approach 
to quantum theory, though they are in my mind insufficiently treated in the 
statistical literature. 

First, we discuss the genuine choice of experiments which was mentioned in 
the beginning of this Section. In ordinary statistics this is related to the issue of 
experimental design, which has its own large literature. In quantum physics we 
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can think of the choice between measuring the position or the momentum of a 
particle, or of measuring the spin of an electron in one of all possible directions. 
In each case we have the choice between several possible experiments E°, and 
given each choice, we have a measurement apparatus, which in our terminology 
may be modelled by a standard statistical model Q e {dy). Here 9 a is the aspect 
of the state of nature which is related to our chosen experiment, say the 'real' 
position of the particle in question. 

It will be useful for us to introduce a quantity <p which is called the to- 
tal parameter of the system in question, and let the parameters of the single 
experiment be functions of this total parameter: 9 a = 9 a (</>). In a quantum 
mechanical setting, the total parameter <f> will not take a value, in the sense 
that it cannot be estimated from any experiment, and this may in fact happen 
in a macroscopic situation also: Consider one particular patient at one point of 
time, and let 9 1 be his expected recovery time if he is given a treatment 1, and 
let 8 2 be his expected recovery time if he is given a treatment 2. The parame- 
ters 9 1 and 9 2 are both unknown here, but only one of these can be confronted 
with an experimental value. Thus (ft = (9 1 ,9 2 ) can not be considered to have a 
value, except in a hypothetical sense. The reason why it nevertheless is useful 
to consider such a total parameter, will become clearer in the next Section. 

Next: All the parameters that we have talked about have been conceptual. 
In some cases the description given by the parameter may be too detailed, again 
it may be impossible to give it a value, at least it is impossible to estimate it 
from any possible observation. One example of this was the total parameter </> 
above containing the result of mutually exclusive treatments. Another example 
can be 9 = (/i, a) where fx is the ideal measurement value given by some fragile 
apparatus which is destroyed after one measurement, and where a is some ideal 
measure of uncertainty, found by dismantling the apparatus, a process which 
also destroys it. 

In such cases it may sometimes be helpful to introduce a reduced param- 
eter A = X(6), and it may then be the case that A actually can be estimated 
through observations. In this paper we will then talk about a reduced model, a 
statistical model for the observations which only depends upon the parameter 
A, not upon 9. In statistical practice such reduced models are sometimes chosen 
in order to improve prediction performance, even in cases where 9 is estimable. 
In our setting the model reductions are necessary: Information beyond A can 
never be found. Interestingly enough, there are statistical situations (collinear- 
ity in multiple regression) where connections between the two kinds of model 
reductions can be seen. 

A somewhat different example of a parameter which do not take a value 
can be the expected drying time of the paint I use when painting my house, 
if the case is that I after some consideration choose not to paint my house at 
all. Such cases are related to the field of causal inference, where the concept of 
countcrfactuals plays an important role. 

As a final statistical concept, a perfect experiment E is an experiment where 
all measurement error may be disregarded, so that for the parameter A, the 
estimate may be taken to be equal to the real value of the parameter: A = A. 



7 



3 Symmetry in statistical models. 



There is an important part of statistical theory which in addition to the statis- 
tical model assumes that there is a symmetry group acting upon the parameter 
space. The simplest such group may be induced by a change of units in the 
observations. More generally, for a scalar parameter we may have a scale group 
given by 6* i — > 60; b > 0, also written 9g — b9, a location group given by 9 i— > 9+a, 
written 9g = 9 + a, or a scale and location group given by 9 i— > b9 + a. Another 
important case is the rotation group for a vector parameter. For convenience, 
group actions will be written to the right of the parameter or observation to 
transform in this paper. 

In general, a group acting upon the observations induces a group acting upon 
the parameters through using the statistical model E3 • 

It is essential that the existence of such a transformation group G do not 
imply strong symmetry assumptions neither on the parameter space nor on the 
sample space. The main requirements are that the spaces should be closed under 
the group and that inference from (yg, 9g) should be as natural as inference from 
(y, 9). This and other aspects are further discussed in HJ. 

For our purpose, it is important that a transformation group also can be 
defined upon a total parameter, even if this does not take a value. For example, 
in the case <\> = (9 1 ,9 2 ), where 9 % is the expected recovery time for some fixed 
patient receiving treatment i, the time scale transformation given by 4>g — 
(b9 1 ,b9 2 ) is definitively meaningful. 

In this paper we will assume that the total parameter space $ is locally 
compact and otherwise satisfies weak conditions which ensures that there is a 
right invariant measure v under the group G on this space. This can be uniquely 
defined up to a multiplicative constant by first defining a right invariant measure 
vq on G by v G {Dg) — Vq{D);D C G,g £ G, and then taking u{d{4> g)) = 
VQ{dg). In the corresponding case of an ordinary parameter, the right invariant 
measure can be recommended as a non-informative prior, a concept which is 
important in Bayesian statistics. 

When $ is a total parameter space with a transformation group G defined 
upon it, we may also be interested in the transformations on a parameter 9 = 
9(<p); <f> € <&. This parameter is said to be permissible under the group G if 

0(<j)i) = 8(<h) implies 9{<j) 1 g) = 6{^ 2 g) for all g G G. 

For a permissible parameter 9 the transformations g can be defined in a unique 
way by (9g)(cf)) = 9((f>g) In many simple cases the parameter 9 will be permissi- 
ble, but not always. It can be shown in general 9 that there always is a unique 
maximal subgroup G° of G such that the parameter 9 is permissible under G°. 

The orbits of a group G as acting upon a parameter space O is the set of 
parameter values obtained under the group by starting at a fixed parameter 
value 9q, i.e., the orbit in this case is given by {9og;g G G}. For a scalar 
parameter 9 under the scale group 9 i— > b9; b > there are three orbits: {9 : 9 > 
0}, {9 : 9 < 0} and then an orbit given by the single point 0. Both the groups 
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9 i— ► 9 + a and 9 i— ► b9 + a have one orbit consisting of the whole space O. In this 
case, when there is only one orbit, we say that the group action is transitive. 

For a vector parameter 9, the group of rotations 9 i— > C#; C l C = I, det(C) = 
1 has orbits consisting of the shells with constant \\9\\, while the group of all 
non-singular matrix multiplications: 9 i— > ^40;det(^4) ^ is nearly transitive; 
precisely, it has two orbits {9:9^0} and {0}. 

There is a general statistical inference theory under symmetry, that is, for a 
statistical model endowed with a group G, for a summary, see UJ. One important 
result is the following: Within the orbits of a group acting upon the parameter 
space, one can always find an optimal estimator in a welldefined sense. This puts 
limitation upon the ways model reductions should be made, more precisely: In 
any statistical setting where a group is defined in a meaningful way, I will assume 
that every model reduction is to an orbit or to a set of orbits of the group acting 
upon the parameter space. Several examples motivating this policy further are 
given in 

I will finally need some concepts from group representation theory. A set 
of unitary operators U(g);g £ G acting upon a Hilbert space H is called a 
representation of the group G if £7(<?i<72) = U{g\)U(g2) for all 31,32 £ G, so 
that these operators form a new group, homomorphic to G. 

A space V C H is invariant if U(g)v 6 V whenever v € V and g £ G. An 
invariant space is called irreducible if it does not contain any proper invariant 
subspace. As is well known, unitary representations, invariant spaces and irre- 
ducible spaces play an important role in many branches of quantum mechanics. 

A simple and important example is the regular representation Ur, defined 
on the Hilbertspace £ 2 ($, v) of functions / on the total parameter space $ by 

U R (g)f(<f>) = f(H). (2) 

Assume now that 9 is a permissible parameter. Then it is an easy excercise to 
show that 

v = {/ : m = mm (3) 

is an invariant space under Ur, and that 

U R (g)fW = f(9(M) = f((9g)m (4) 

on V. 

4 The new setting for quantum mechanics. 

I am now ready to formulate what I consider to be the most natural set of 
axioms of quantum theory, axioms that may be motivated from the statistical 
discussion above. I will aim at deriving the Rules 1-3 of the Introduction from 
these axioms in the remaining parts of this paper, and in fact, I will derive more 
than that. 

In contrast to the traditional set given by Rules 1-3, I consider most of the 
axioms below to be relatively natural in the light of common sense and in the 
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light of basic statistics in the way it has been formulated above. It is the totality 
of assumptions which makes the core of quantum theory in the way I see it. 

Axiom 1. For a given closed system there is total parameter space $ whose 
elements (/> G $ are not estimable relative to any experient. There is a transfor- 
mation group G defined on The space $ is locally compact, and the group G 
is transitive on $. The right invariant measure under G on $ is called v. 

Axiom 2. There is a set A of potential experiments E a ;a G A on this 
system. For each a G A there corresponds a maximal parameter 9 a — 9 a ((f>). 
The parameter 9 a may or may not be estimable. 

Axiom 3. For the case where 9 a is not estimable, there is a model reduction 
X a = X a (9 a ) such that X a is estimable under the experiment E a . Assume that 
X a is maximal under this requirement. 

This means that the remaining part of 9 a , that is, fixing A a , is not estimable 
from any experiment. 

Axiom 4. For each a G A there is a realization of the experiment E a (that 
is, a measuring instrument) such that the observation y has a complete sufficient 
statistic t a . 

Axiom 5. When A a is discrete, Axiom 4 can be strengthened to the existence 
of a perfect experiment for X a , an experiment where measurement noise can be 
disregarded and where the resulting (pure) states are determined by statements 
of the form A a = A£ . 

Given the result of a perfect experiment E a ; we can predict the result of a new 
perfect experiment E ft . When X a takes only two values, we must also assume 
that such a probabilistic prediction exists when we allow for a final state being 
a mixed state or an effect; for definitions and interpretations see later. 

Definition 1. Let G a be the maximal subgroup of G with respect to which 
X a ((f>) is a permissible function of <f>. This group will always exists. 

For dcfinitcncss, regarding statistical methods, it is natural to assume Bayesian 
estimation in each experiment with a prior induced on each A a by v on This 
is also equivalent to what is called the best equivariant estimator under the 
group G a , see [5]. 

Axiom 6. (i) For each pair of experiments E a ,E h ;a, b G A there is an 
element k ab of the basic group G which induces a correspondence between the 
respective reduced parameters: 

X b = X a k ab or A b (0) = A Q (# ah ). (5) 
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(ii) The set K — {k a b;a,b e A} of transformations upon $ constitutes a 
subgroup of G. 

Using the fact that (i) implies that, given g a , one must have X a g a equal to 
X b g h kba for some g b , then using (i) again, it follows that X a (4>g a ) = X a (4>k a bg b kb a ) 
for all a. Hence, since G a is determined by its actions on A a : \ a ((f>g a ) = 
(\ a g a )((f>), it follows that g a = k a bg b kb a - What we really need later in this 
paper is the implied relation between unitary representations 

U(g a ) = U(k ab )U(g h )U\k ab ). (6) 

The relation g a — k ab g h k ba is what mathematicians call an inner homo- 
morphism between group elements, or really an isomorphism. An isomorphism 
means that essentially the same group is acting upon both spaces A° and A 6 , and 
often in such cases the same group element symbol is used. We will use different 
sumbols, however, because the actions are related to different experiments. 

The symmetry assumptions between experiments given in Axiom 6 will turn 
out later to be crucial for the development of quantum theory. 

Axiom 7. The groups G a ; a € A generate G. 

We will show later that these axioms leads to the existence of a Hilbert space 
H a for the experiment E a in such a way that there are basis vectors f% E H a 
which can be uniquely coupled to the statements that a perfect measurement of 
A a gives the result that A a = A^ for suitable constants \%. These Hilbert spaces 
can be realized as subspaces of L 2 = L 2 ($, is), where v is the invariant measure 
under G assumed in Axiom 1. In particular, H a is a subspace of 

V a = {f EL 2 :f^) = f{\ a m}- (7) 

Both H a and V a are invariant under the regular representation of the group 
G a ; in simple cases the two are equal. In general we have one of two cases: 
Either H a is an irreducible space under G a or we have a decomposition H a = 
HJ ® ® • • • , where each H" is irreducible under G a . 

For different a, the Hilbert spaces H a turn out to be unitarily connected, 
and this is crucial. It implies that there exists a common Hilbert space H such 
that H a = [7 a H for unitary U a . The space H is an invariant spcac of the whole 
group G under the representation exemplified by 

W(g?g b 2 g c 3 ) = U^U(gt)U a U^U(g b )U b U^U(g c 3 )U c (8) 

If H is not irreducible under {W^ (<?)}, then we have a decomposition H = 

Hi ® H 2 © This decomposition will correspond to the well known superse- 

lection rules of quantum mechanics. 

The last question is then if all normalized vectors in each component 
can be considered as a state vector. This will be proved later to follow from 
the other assumptions above together with the following axiom, which may be 
expected to hold if there are enough potential experiments involved. 
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Axiom 8. The unitary group generated by {W(g)} and the phase factors 
e %a is transitive on each component EL . 

5 A large scale model. 

One of the simplest non-commutative groups is the group 5*3 of permutations 
of 3 objects. It has a two-dimensional representation discussed in many books 
in group theory and in several books in quantum theory. The quantum theory 
book by Wolbarst ^Hj is largely based upon this group as a pedagogical example. 

I will in this Section visualize this group by considering the permutations 
of the corners of an equilateral triangle, which can be realized physically by 
the change of position of some solid version of this triangle. This will serve to 
illustrate the axioms above on a macroscopic example. 

The spatial orientation <fi of the whole triangle will be looked upon as a 
hidden total parameter, and to this end we will place the solid triangle within a 
hollow nontransparent sphere, with the corners on the sphere, in such a way that 
it can rotate freely around its center point, placed at the center of the sphere. 
The basic group G is to begin with taken as the group of such rotations, but 
later, when we specialize to the corners, we will take G as the permutation 
group. Let the solid triangle be painted white on one side and black on the 
other side. 

Let there be 4 small windows in the sphere, one at the north pole, where 
the colour facing up can be observed, and three equidistant windows along the 
equator, where the closest corner of the triangle can be observed. 

The measurements made in the windows could be uncertain for some reason, 
and we could model this in the ordinary statistical way by some model Q x (dy) , 
depending upon a (reduced) parameter which can be thought about as the ideal 
measurement. 

Hence there arc 4 reduced parameters, corresponding to the 4 different exper- 
iments that can be done in this case, one for each window: A is the ideal colour 
as observed from the north pole window: A a , A b and A c are the three 'correct' 
corners of the triangle as observed from the windows a, b and c, respectively. 
The term 'correct' will be defined more precisely below. The parameter A takes 
the values Bl and Wh, and the parameters A 1 for i = a,b,c each takes the values 
A, B and C, say. All these parameters can be considered to be functions of the 
triangle's spatial orientation <f> within the sphere. 

Lemma 1 Both with respect to the group of permutations and with respect 
to the group of rotations, A is a permissible function, while X a , X b and A c each 
are non-permissible. The largest permutation group with respect to which X a is 
permissible, is the group of cyclic permutations of the corners of the triangle, 
similarly for X b and X c . 
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Proof. 

Consider the 6 elements of the group S3 of permutations: gi(ABC 1— » ABC), 
g 2 (ABC 1 ► CAB), g 3 (ABC ^ BCA), g 4 (ABC ^ ACB), g 5 (ABC ^ CBA), 
g 6 (ABC 1 ► BAG). 

Assume A°(</>i) = A (^2), say black, for two total parameter values <j>i and 
<j>2- Then by simple inspection, A°(<?i0i) = \°(gi<p2) — Bl for i — 1,2,3 and W/i 
for i = 4,5, 6. Hence A is permisible. 

For the other functions it is enough to produce a counterexample. Here is one 
for A a : Let <px be any hyperparameter value, and by definition let ABC be the 
sequence of corners in <pi corresponding to the windows a, b, c. Put <f>i = <p\gi 
for i = 2,..., 6. Then A Q (>i) = A a (0 4 ) = A, but \ a ((f> ig5 ) = \ a ((f> 5 ) = C 
and X a ((j)4g5) = A°(03) = B. Since the group elements 54 and g§ have the 
same structure as g$: Permutation fixing one corner of the triangle, a similar 
statement holds for these group elements. 

To check that A a is permissible under the cyclic group, we can use direct 
verification. The details are omitted. A geometric proof is simpler than an 
algebraic proof. 

One can easily imagine that an ideal measurement at the window a in prin- 
ciple can give more information 9 a about the position of the triangle, but that 
this information is hidden. One way to make this precise, is the following: Let 
us divide the sphere into 3 sectors corresponding to each of the windows a, b and 
c by using the meridians midway between a and b, midway between a and c and 
midway between b and c as borders between the sectors. Let S a be the sector 
containing window a. Define 9 a as 1) the points among the triangle corners A, B 
and C that happen to be in the sector S a ; 2) the coordinates of any two points 
which happen to belong to the same sector. By a geometric consideration, it 
can be seen that 9 a is permissible with respect to the subgroup G a of rotations 
around the polar axis, and only permissible with respect to this subgroup. 

From the geometry, it can be seen that S a can contain 0, 1 or 2 triangle 
corners. This can be used to define A a precisely: If S a contains 1 corner, let this 
corner be A". If S a contains 2 corners, let the closest one, as calculated from the 
coordinates, be A a . If S a contains corners, then exactly one of its neighbouring 
sectors must contain 2 corners. One is then chosen to be A & , respectively A c ; let 
the other one be A a . Since the coordinates of the corners that are in the same 
sector are contained in 9 a , it is seen that A a is a function of 9 a . 

Note that the reduction from 4> via 9 to the parameter A is forced upon us 
in this situation by the limitation in the possibility to make observations on the 
system. 

We assume that there is some mechanism to ensure that it is impossible to 
look through two equitorial windows at the same time. 

The Axioms 1, 2 and 3 are easily seen to be satisfied for this example. As for 
the Axiom 5 and and the first part of axiom 5, they can be assumed to hold via 
some observational system. As for the possibility of prediction in the last part 
of Axiom 5, this follows from the symmetry of the system. Annihilation and 
creation is not relevant for this example. Again, Axioms 6 is easily satisfied. 
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Axiom 7 is not satisfied. The basic group S3 has an irreducible two-dimensional 
representation. Then, from the theory which we will develop later, it is possible 
to predict probabilistically, from an observation in window a, the results of an 
observation in window b immediately afterwards. 

However, in this case the prediction will be rather trivial, something that 
can be related to the fact that the subgroup under which the observation in a 
window is permissible, is the same for all windows. More complicated examples 
using the same concept can easily be imagined. 

6 The spin 1/2 particle. 

The most simple quantum mechanical system, a qubit, is realized as an electron 
with its spin. The spin component A can be measured in any spatial direction 
a, and A always takes one of the values -1 and +1. 

In this section, we will give a non-standard, but quite intuitive description 
of a particle with spin, a description which we later will show to be equivalent 
to the one given by ordinary quantum theory. 

Look first at a general classical angular momentum. A total parameter </> 
corresponding to such an angular momentum may be defined as a vector in 
three dimensional space; the direction of the vector giving the spin axis, the 
norm gives the spinning speed. A possible associated group G is then the group 
of all rotations of this vector in R 3 around the origin together with the changes 
of the norm of the vector. 

Now let the electron at the outset have such a total parameter <p attached to 
it, and let k = ||</>||. It is of course well known that it is impossible to obtain in 
any way such detailed information about the electron spin. Now let us forget for 
a moment what we know about the electron, and assume that we set forth in an 
experiment E a to measure its angular momentum component 9 a ((f)) = Kcos(a) 
in some direction given by a unit vector a, where a is the angle between <f> and a. 
The measurement can be thought of as being done with a Stern-Gerlach device, 
which strictly speaking measures an observable y whose distribution depends 
upon 9 a , implying a possibility that the parameter 9 a - or some part of it - can 
be estimated from such a measurement. Given a, and given the measurement 
in the direction a, the rest of the total parameter 4> will be unestimable. 

With respect to the group G, the function 9 a {-) is easily seen to be non- 
permissible for fixed a, simply because two vectors with the same component 
along a in general will have different such components after a rotation. The 
maximal possible group G a with respect to which 9 a is permissible, is the group 
generated by the rotations of the vector <p around the axis a possibly together 
with a 180° rotation around any axis perpendicular to a, plus a possible scale 
change k i— > bn; b > 0. 

In analogy to the situation in the previous Section, assume now that the 
electron's total parameter <p always is hidden, in such a way that for every a, 
the only part of Kcos(a) we are able to measure, is the value +1 or -1, giving 
the sign of this component. We call this part X a ((f>). This is an extreme model 
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reduction compared to 9 a , but interestingly enough, the model reduction is to 
an orbit of the group G a . 

The measured part found by the Stern-Gerlach apparatus, which may con- 
tain additional measurement noise, also takes the values ±1, and is called A . 
In some instances below, we will disregard such measurement noise, and assume 
the ideal condition A a = A a . Such an approximation makes sense, also from a 
statistical point of view, for a discrete parameter. 

Finally, since the model reduction is to a parameter of fixed norm, we delete 
the scale change part from the groups G and G a . In particular, G is the group 
of rotations. 

The Axioms formulated earlier can again be verified for the electron as de- 
scribed above. The electron as such has a Hilbert space with only one compo- 
nent, but for some given particle which can have spin 0, 1/2, 1, 3/2 or..., the 
general representation of the rotation group gives a decomposition as indicated. 

7 Hilbert space of a single experiment. 

In the following sections we will derive the usual quantummechanical descrip- 
tion of the electron from the simple description above, including the Hilbert 
space with the associated interpretation of state vectors, the transition proba- 
bilities found from Born's rule and the representation of the state vectors for the 
electron spin on the Bloch or Poincare sphere. An important point, however, 
is that we will make the derivations in general terms, and to this end we will 
take our basic Axioms as a point of departure. Because of space limitation, the 
derivations will be rather brief, but we hope to expand on this elsewhere. 

We first look at the space S of observations, which we for simplicity can 
assume is common for all potential experiments. By Axiom 4 we assume that 
for each experiment E a there exists a complete sufficient statistics t a under 
the (reduced) model Q x for this experiment. Through the statistical model, a 
transformation group on the sample space of an experiment induces a transfor- 
mation group on the parameter space. In || it is shown that for a complete, 
sufficient statistics one can also go the other way. Thus G a (Definition 1) defines 
a transformation group, also called G a , on S. From the Axiom 7 it follows that 
the groups G a together generate the full group G, which then can be assumed 
defined on S and having a right invariant measure Q on that space. Without 
loss of generality we can - and will - assume that all the model measures Q x 
are absolutely continuous with respect to Q. All this must be considered to be 
details with the main purpose of establishing the measure Q on S. 

We now define the Hilbert space K a as the set of all complexvalued functions 
h{t a ) e L 2 {S, Q) such that f(<j>) = E x " (h(t a )) g L 2 (<f>, u). Since sufficiency and 
completeness are properties that are invariant under group transformations, it 
follows that K a is an invariant space for the regular representation of the group 
G on L 2 {S,Q). 

Define now the operator A a from K a to V a = {/ S L 2 (<fr,v) : f((f>) = 
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/(A a (<«} by 



(A a y)(\ a (d>))=E xa (y) 



(9) 



Definition 2. Define the space H Q C V a by H a = A a K a . 

It is easy to see that H a is a closed subspace of L 2 ($,i>), and therefore a 
Hilbert space. 

Proposition 1. The space H a is an invariant space for the regular repre- 
sentation of the group G a . 

A main result is now: 

Theorem 1. The spaces K° and H° are unitarily related. Also, the regular 
representations of the group G a on these spaces are unitarily related. 

To prove this, one has to show that the mapping A a can be replaced by a 
unitary map in the relation H° = A a K. a . This is shown in detail in ^H] by first 
constructing an explicit relation between the two regular representations and 
then using a general mathematical result stating that two representations are 
unitarily equivalent if they are equivalent. 

8 The quantum theoretical Hilbert space. 

The regular representation U restricted to the subgroup G a can also act upon 
functions of A a , similarly for the representation restricted to G b . Then both H a 
and H & are invariant spaces for these respective representations by Proposition 
1. In this Section we will assumme that these invariant spaces are irreducible; 
the general case is treated in later. We need 

Schur's lemma 2 // Ui and U2 are irreducible representations on Hilbert 
spaces Hi and H2, and A is such that Ui(g±)A — AU2{g2) for all g\ and some 
corresponding g2, then either A = or A is an isomorphism between the spaces 
Hi and H2. 

(In [2] one assumes g\ — 52 , which is related to the tradition in pure math- 
ematics to take abstract groups as points of departure. By contrast, we talk 
about transformation groups, and then it may be natural sometimes to use 
different symbols for transformations on different spaces.) 

Recall from Axiom 6 and © that U(g a )U(k ab ) = U{k ab )U(g b ). From this 
and Schur's lemma on each irreducible component it follows that we can con- 
struct a connection between the spaces by 



H b = U(k ab )W 



a 



(10) 
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Then I|1U[1 and Theorem 1 gives an important result: 

Theorem 2. There is a Hilbert space H, and for each a £ A. a unitary trans- 
formation U a such that H a = C/ a H. There are also unitary transformations V a 
such that the observational Hilbert spaces satisfy K a = V a H. 

By Axiom 7, {G a ; a £ A} generate the whole group G. From this it follows: 

Theorem 3. The basic Hilbert space H is an invariant space for the whole 
group G. 

Proof: From Proposition 1, H a is an invariant space for the group G a under 
the regular representation U. Assume now that g = 313253, where gi £ G a , 
32 £ G b and g 3 £G c . Then 

^(313233) = U^U( 9l )U a U^U(g 2 )U b U^U(g 3 )U c (11) 

gives a representation on H of the set of elements of G that can be written as a 
product 313233 with 31 £ G a , 32 £ G b and 33 G G c . Continuing in this way we 
are able to construct a representation of G on the space H. 

9 The finitedimensional case with perfect mea- 
surements. 

The discussion up to now has been very general, but in the rest of this paper 
I will concentrate on the case where H has a finite dimension n. Then by the 
unitary connection just proved, each of the spaces H a will also be n-dimensional. 

Now H a is a function of A a = A a (</>). Assume for simplicity that A a is a 
scalar; since it can be assumed to take a finite number of values, this can always 
be arranged for. Define on this space an operator S a by 

S a f(\ a ) = X a f(X a )- (12) 

As a Hermitian operator on an n-dimensional Hilbert space, this will have n 
eigenvalues A£ and corresponding eigcnfunctions f£. 

Using Axiom 5, we will from now on assume that each experiment E a is 
perfect, that is, measurement error can be disregarded, so that A = A a . Then 
we will show that the eigcnfunctions above can be chosen in a particularly simple 
way, there is no degeneracy, and H a will be the whole space V a = {/ £ L 2 : 

/(</>) = f(\ a (4>))}- 

Theorem 4. (i) For a perfect experiment E a associated with a finite- 
dimensional Hilbert space, the space H a contains the normalized functions (0) = 
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(ii) These functions are eigenf unctions for S a corresponding to eigenvalues 
- There is no degeneracy in this case. 
(Hi) We can take H a to be equal to V a in this case. 

Proof: (i) Since the experiment is perfect and by Axiom 3 has a complete 
sufficient statistic t a which then J3| is minimal, this t a must take a finite number 
of values t% in such a way that t% leads to the certain conclusion A a = X%. Let 
h%(t a ) = ^h\I{t a =t%), and define /»(<£) = E x " ' h a k {t a ) = P x " {t a =t a k ). Since 
the posterior probability in general is proportional to the likelihood times the 
prior, and the prior in this case is uniform on the discrete A-values, this must 
imply that the likelihood is f%(4>) ~ V™1 l (4>) = ^1)- By definition, these 
must belong to H°. 

(ii) The first part is obvious. For these particular eigenfunctions, equal 
eigenvalues implies equal eigenfunctions, so there is no degeneracy. 

(iii) The fact that H a is n-dimensional, means that the n eigenfunctions 
/£(</>) = y/nl{\ a {4>) = A£) span H a for the n different eigenvalues A£. But 
these functions also span the total set of functions on A", ... , A°, i.e., V a . 

Remark: Using the L 2 -norm in V a , the functions /? above are normalized, 
since the invariant (probability) measure has mass 1 jn on each Xf, . 

Fix c such that H = H c , which has a basis {ff\ given by fj(<f>) = y/nl(\ c ((f>) — 
Xj. In a different space H a these correspond to f?{<f) — U(k C a)fj(<f>)- Now de- 
fine vectors 

v - = W(k m )ff, (13) 

where W is the representation (flip . These are eigenvectors of the selfadjoint 
operator 

T a = W(k ca )S a W(k ac ) (14) 

with eigenvalues X- = Xjk ca . 

The information contained in the vector v% is simply: The perfect experiment 
E a has been performed, and the result is that A a = A£. 

The operator T a may be written 

n 

T a = Y, x »k- ( 15 ) 

1 

These operators are selfadjoint. 

Remark: The present operator S a and thus T a has a non-degenerate spec- 
trum. This corresponds to the fact that the parameter A a - though it has also 
been called a reduced parameter relative to a parameter 8 a which do not take a 
value - is a full parameter for the experiment E a . To achieve degenerate spectra, 
one must look upon partial parameters \x a = /i a (A a ), i.e., spin of one particle in 
a two-particle system or the mass as one of several quantum numbers associated 
with a particle. 
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10 Invariance, states and superselection rules. 



In the previous Section we started with indicator functions f% in H a as eigen- 
functions of S a with eigenvalues A^ . These are transformed in the natural way 
by the group elements g a € G a and by the corresponding regular unitary rep- 
resentation operators. 

Theorem 5. If f% is an eigenvector for S a with eigenvalue then U{g a )f k 
is an eigenvector with eigenvalue X^g 01 . 

The proof uses the fact that U{g a )f^{(j)) = /£ {{\ a g a ){<p)), so S a f£ = \ a k f% 
implies U\g a )S a U{g a )fZ{4>) = \%9 a f^). 

Corollary 1. Let W(g a ) = U a ^U(g a )U a be the operator on the basic Hilbert 
space H corresponding to U(g a ) on H a . Then the following holds: If v% is an 
eigenvector for T a with eigenvalue \%., then W(g a )v% is an eigenvector with 
eigenvalue A£g a . 

It is a general theorem from representation theory [2] that every finitedi- 
mensional invariant space can be decomposed into irreducible invariant spaces. 
Thus we have H a = HJ © Hg © . . ., where the are irreducible. 

By doing this in both the experiments E a and E b we see that we can assume 
that the relation U(g a )U(k a b) = U(k a b)U(g b ) holds separately between irre- 
ducible spaces on each side. From this, Schur's lemma gives that we have rela- 
tions of the form H" = t/(fc a i,jj)H^ between all these spaces, or else U(k a uj) = 
with no relation at all. This gives unitary relations between pairs of irreducible 
spaces, one for each experiment, and the space H & has a conformable decompo- 
sition H\ © © . . .. 

It follows also that H = Hi © H2 © . . . conformably, and that there is a 
unitary connection H° = E/"Hj. 

Summarizing the above discussion we have: 

Theorem 6. The basic Hilbert space H can be decomposed as H = Hi © 
H2 where each H^ is an irreducibel invariant space under the group G. 

All unit vectors in the Hilbert space under this decomposition are possible state 
vectors. Each part corresponds to a fixed value of one or several quantities that 
are conserved under all experiments. 

It remains to prove that all v G H^ are possible state vectors under Axiom 8. 
To this end, first observe that all state vectors are of the form v% = W(g c k ca )fj 
for some g c G G c and k ac G K. 

Now every group element g G G is of the form g a k a b for some b, some 
g a G G a and some k ab G K. For instance, let g = g^g^gi for g\ G G b , g\ G G c 
and gf G G d . Then 

9 = 9i92(93 k cd) = 9i9tk c d = gi(gthc)k c d = g\k bd = glk ah k hd = g%k ad . 
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From this, the set of state vectors coincides with the set of vectors of the 
form W(g)vo for some vq = f!f. 

Therefore, under Axiom 8 {U(jg)v%; g S G} consists of all unit vectors in H;, 
and all unit vectors in each irreducible space EL are state vectors. 

An alternative way to formulate a non-trivial decomposition H = Hi ©H2© 
. . . may be to say that not all linear combinations of state vectors are state vec- 
tors, in particular not linear combinations of vectors that correspond to different 
values of absolutely conserved quantities like mass or charge. These superse- 
lection rules are introduced in an ad hoc way in standard quantum mechanics, 
and they are neglected in many textbooks. They arrive in a quite natural way 
in our approach. 

11 The Born formula. 

So far I have constructed the quantummechanical Hilbert space from my setting, 
and I have given a simple interpretation of the vectors of this Hilbert space: 
They can all be interpreted in a question-and answer form: 1) Choose a and 
ask the question: What is the value of A a ? 2) Do a (perfect) experiment and 
give the answer: A a = A^. 

What is left, is to associate probabilities to these states. In this paper I follow 
Cartwright 0] and say that all probabilities come from transition probabilities. 
These can be found from Born's formula. Later I will show, in a similar way as 
other authors have done it, that the ordinary quantum formalism follows from 
this. 

In this and the next Section I will concentrate on the case with one irreducible 
component in the Hilbert space, i.e., I will neglect superselection rules. This is 
really no restriction, since transitions always are within one component. 

By Axiom 5 there exist transition probabilities in the sense that if we know 
that A a = X% : there is a well-defined probability that a later perfect experiment 
E b will result in X b = X b . We must first investigate what it means that two such 
final states correspond to the same state vector v. 

Lemma 2. Consider a system where each experimental parameter A only 
takes two values: -1 and +1. Assume that two vectors in H satisfy v\ — v?, 
where v b corresponds to X b — X b for a perfect experiment E b and u| corresponds 
to A c = Xj for a perfect experiment E c . Then there is a 1-1 function F such 
that A c = F(\ b ) and = F{X b ). 

Proof: The assumption is that 

ub t f b = rjcffc^ (16) 

Without loss of generality, let both X b and Xj be equal to 1 . Then 

V^T(A C (0) = 1) = /j=(0) = iTUtffo) = ^il((X b k bc ){<t>) = 1). 
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Thus I(X C = 1) = I(X b ki, c = 1). But this means that the two level curves 
coincide, and we must have A c = X b ki, c = F(X b ). 

My aim in this Section is to prove Born's celebrated formula for transition 
probabilities in quantum mechanics. This result is related to the well known 
El Gleason's theorem, and will be proved from this theorem (together with 
Lemma 2 above) in for the case where the dimension of the Hilbert space 
is larger than 2. 

In this paper I will concentrate on the case of dimension 2. Then it seems 
necessary to use stronger assumptions, as stated in Axiom 5 above. For this 
case I will also use a more suitable version of Gleason's theorem recently proved 
by Busch J3] and Caves et al [5], which also is valid for dimension 2. 

Gleason's theorem variant. Consider the set of effects on a Hilbert 
space H, defined as the set of Hermitian operators having eigenvalues in the 
unit interval. Assume that there is a generalized probability measure tt on these 
effects satisfying 

< ir(E) < 1 for all E. 

n(I) = 1. 

7r(^_Ej) = X) 7r (^'i) whenever < /. 

Then there exists a density operator p, that is, a positive operator with unit 
trace, such that n \E) — tr(pE). 

Associated with this I now prove Born's formula: 

Theorem 7. Assume that we know that X a — Xf and that we are interested 
in the probability that a new perfect experiment E & results in X b = X b . Then this 
transition probability is given by 

P{X b = X b \X a = A") = \vfv b \ 2 . (17) 



As already stated, I will carry out the proof below only for dimension 2. 
It can be seen from the proof that I really show more than what is stated in 
Theorem 7: There is also a corresponding formula for transition to and from 
mixed states which follows. Furthermore, in the next Section I also discuss 
concrete interpretations of the rather abstract 'effects' for the case of dimension 
2, and I actually here give a proof of the Gleason theorem variant for this case. 

Proof: For this proof fix a and i and hence the state vf , interpreted as 
A a = . Without loss of generality we can take i = 1 and A" = 1. By Lemma 
2, every end state corresponding to the same vector v = v b must lead to the 
same transition probability. By Axiom 5 we can assume an extension of this 
class of transition probabilities, in the next Section shown to be equivalent to 
the transition to an arbitrary effect as defined above. 
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A general twodimensional Hermitian operator can be written E — \{rl + 
cu ■ a), where r and c are scalar, u is a 3-vector of unit norm and a is a vector 
with the 3 Pauli spin matrices as components: 

1 \ ( -i \ ( 1 

1 o ) > CT « = i o ) ' = \ o -i 

The eigenvalues of E are j(r±c). The factor i is included for convenience. 

In particular this representation holds for an effect, for which the conditions 
< c < 1 and c < r < 2 — c ensure that the eigenvalues are between and 
1. A very particular case is the Bloch sphere or Poincare sphere of a pure 
state density matrix vv', which corresponds to the case c = 1 and r = 1. An 
intermediate case is a general density matrix with r = 1 and c < 1. 

In this proof, and also in the next Section, we will study the set in the (r, c)- 
plane implied by the set of effects. Here the effects are confined to the interior 
of the triangle with corners (0, 0) (giving E = 0), (2, 0) (giving E — I) and (1, 1) 
(corresponding to points on the Bloch sphere) . 

I start by assuming that there exists a transition probability n(r, cu) from 
the fixed state given by A a = 1 to an arbitrary effect E = 5 (rl + cu-cr). This 
will be interpreted concretely in the next Section. 

The basic requirement given to probabilities on these effects is that ^Ei < I 
should imply Y^ 7r ( r ii c i u i) — 7rQ^ rj, ^2 Cjtij). Arguments for this requirement 
from simple, general assumptions will be given in the next Section. 

One way to satisfy this requirement, is to let n(r, cu) = |(r + ck- u) for some 
3-vector k. We will start by studying this particular solution. First, we must 
have ||fc|| < 1 in order to get a probability tt in the interval [0, 1] for pure states. 
The initial condition is that we shall have tt = 1 when r = 1, c = 1 and u = a 
with norm 1 corresponds to the pure state 

l -{I + a-a)=vtv a 1 \ 

This gives k ■ a = 1, and since a and k both have norm less than or equal to 1, 
then by necessity k = a. 

But in fact, 7r(r, cu) = |(r + ca ■ u) is the only solution of the problem. This 
follows from the following result, proved by Caves et al. [S]: irE must be a 
linear function: ir(aE) — air(E). Since E = \{rl + cu ■ a) depends linearly 
upon 4 parameters r, cu\, CU2, CM3, it follows that n(r, cu) must have the form 
/3r + 7 • (cu). Since 7r(2, 0) = 1, we must have (3 — 1/2. As just shown, this 
implies 7 = fc/2 = a/2. 

So it remains to reformulate ir(r,cu) = i(r + ca ■ u) for the case where the 
final effect is a pure state, i.e., r = c = 1. To this end, assume vv^ = k(I + u- a) 
and vfv^ = ±(I + a • <r). Then 



K^l 2 = tr(u?i£W) = tr(~(J + a ■ a) (I + u ■ a)) 



tr(j(7 + (a + u) ■ a + a ■ ul) = ^(1 + a ■ u). 
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Thus it follows that ir(r, u) — |w^w| 2 for this case. 

Corollary 2. If the initial state is given by a density function p , then the 
transition probability to a final effect E is given by 

P(E\p Q )=tr(p E). 

Proof: An extension of the same proof. The basic point is that linearity must 
hold. Start with pure initial states and then take p as a probability mixture. 

The case of density matrices can be interpreted by standard quantum me- 
chanics. The general case will be interpreted in the next Section. 

12 Implications for the spin 1/2 particle and for 
simple decision problems. 

12.1 A perfect measurement. 

The point of departure in this Section is again that a state is given by some 
question plus the answer to that question, in short, the perfect measurement E b 
together with A b = A^, or even shorter, just X b = X b k . In the spin 1/2 case this 
will correspond to a choosen direction 6, and the question about what the spin 
component in that direction is. The answer can be +1, corresponding in ordi- 
nary quantum mechanical terms to a certain eigenvector v b , or -1, corresponding 
to the orthogonal eigenvector v\- 

But this means that there is a one-to-one correspondence between the pos- 
sible state functions v (disregarding an irrelevant phase factor) and 3-vectors 
u: Define u = b if a measurement in the direction b gives the result +1, and 
u = —b if a measurement in the direction b gives the result -1. This is consistent: 
a measurement along — b gives the value +1 if and only if a measurement along 
b gives the value -1. 

Many textbooks discuss the Bloch sphere representation of this result: 

v\v b J =E 1 {b) = \(I + b-o) (18) 

v b 2 v<? =E 2 {b)= l -{I-b-o) (19) 

I will let the matrix E(u) = Ei{u) represent the statement that a perfect 
measurement along the direction a has given the value +1 when u = b, and the 
value -1 when u = —b. This is again consistent, and {E(u)} is in one-to-one 
correspondence to the unit 3-vectors {u}. 

This gives the specification of a state corresponding to a perfect measure- 
ment, and every quantummechanical state vector for a spin 1/2 particle can be 
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represented in this way: A question about a spin component in some direction, 
and the answer from a perfect experiment. This is perfectly in agreement with 
the general results given in earlier sections of this paper; for qubits we can, as 
just shown, device a more direct construction. 

The issue of transition probabilities comes after this in my treatment. As- 
sume some initial direction a. The state corresponding to the result that a 
measurement in this direction has resulted in +1, can be called v(a) or E(a) 
or simply a. Starting from this, one can compute the probability of getting +1 
in a new direction b by Born's formula. The result can be expressed in many 
ways, a simple one being P(b\a) = (1 + cos(w))/2, where w is the angle between 
a and b. 

This is a derived result from Born's formula and the theory related to the 
state characterization. The characterization of the state itself can most simply 
in my view be given as above. 

Born's formula is again, in the view of the present paper, a result of the 
assumption that the probabilities exist, not only from pure states to pure states, 
but to mixed states and also to the more general states called effects. Thus it 
becomes important to study these extended state concepts from a statistical 
point of view. 

12.2 Allowing measurement errors. 

Let the Stern-Gerlach apparatus be imperfect: Given that the real value is +1, 
it gives -1 with a certain probability and given that the real value is -1, it gives 
+ 1 with some probability. Then a measurement +1 can be obtained in two 
ways: The right way or the wrong way. 

From this model, from data and from some prior on the two states, one finds 
posterior probabilities for the two states, 

Assume then that the reported state is +1, and that there from this is an 
aposteriori probability p\ < 1/2 that the correct state is -1. Then the true state 
is found by probability weighing: 

Ei(6,pi) - (1 - Pi)\(I + b-cr)+ Pl \{I-b-cj) = \{I+{l- 2 Pl )b ■ a) (20) 

It is well known from the theory of Bloch spheres that any mixed state can 
be found by replacing b for the pure state by cb for < c < 1. Here 1 — 2pi is 
a general c in this interval. Hence any mixed state can be represented by such 
a measurement, in particular, by a vector 6 and an error probability p\. 

An important point is that both pi and the unit vector b can be recovered 
if the density matrix E\ is given. 

Similarly, if the reported state is -1, and there is an aposteriori probability 
P2 < 1/2 that the correct value is +1, one gets the true state 

E 2 (b,p 2 ) = (1 -P2)\{I -b-a)+p 2 l -(I + b-<j)= l -(I-{l- 2p 2 )b ■ a) (21) 
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Of course a similar state can be found with prior probabilities instead of 
posterior probabilities, say with a prior tt for +1. The very common case n = 1/2 
gives a totally uninformative mixed state ^1. 

The transition probability to other pure or mixed states can be found from 
Corollary 2. 

12.3 Information on both types of errors. 

The calculation in the previous subsection was for the case where p\ and pi 
were posterior probabilities for the two types of errors. But a similar calcula- 
tion applies for the case where p\ and P2 are replaced by predetermined error 
probabilities as in statistical testing of hypotheses. For the classical theory of 
hypothesis testing, any statistical textbook can be consulted. 

So we will look upon a hypothesis testing problem with level a and power 
0. In our connection this means the following: Before any data are obtained, 
we make a programme stating how our decision procedure shall be. This goes 
as follows: The decision shall be based upon an observator t a which also takes 
the values +1 or -1, and which is a function of the data, adjusted by means of 
the statistical model in such a way that the two error probabilities are fixed: If 
the correct parameter is \ b , then P(t b = -l\\ b = +1) = a and P(t b = +l\\ b = 
— 1) = 1 — j3. In common statistical language this means that we are testing the 
hypothesis Hq : X b = +1, and this hypotesis is rejected if t b = —1. Then a is 
the level of the test, the probability of wrong rejection, while (3 is the power of 
the test, the probability of rejecting the hypothesis when you should. 

Note that this is still a state of the question/ answer type, albeit in a more 
advanced form: The question is given by the three-vector b and the two pre- 
determined error probabilities a and 1 — (3. The answer is given by \ b = +1, 
say, which is the conclusion we claim if we observe t = +1. So the state must 
involve all the quantities b, a, (3 and the answer ±1. 

Say that we have done the experiment and reported the value +1. Then we 
again will use a weighing according to the error probabilities, even though these 
at the outset refer to different outcomes. Thus the weighted state will be 

E=(l-a)±(I+b-a) + (l-f3)±(I-b-a) = ±((2-a-p)I+(f3-a)b-a). (22) 

This state corresponds to what Busch [3] and Caves et al [S] call an effect 
E = \{rl + ca ■ a), and these effects played a crucial role in our proof of the 
Born formula. In terms of the definition of an effect, we have r = 2 — a — (3 and 
c — (3 — a. 

Again it is important that both a, (3 and the unit vector b (with sign) can 
be recovered once we know the effect matrix E. 

In the (r, c)-plane, the effects are limited to the triangle with corners (0,0), 
(2,0) and (1, 1). But note that b can be replaced by —b depending upon the 
outcome, so the triangle obtained by taking the mirror image around the line 
c = is also relevant. The first triangle corresponds exactly to the limitation 
imposed by the hypothesis testing interpretation proposed above: 
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The limitations imposed by the triangle are c > 0, corresponding to j3 > a, 
and c < r < 1 — c, corresponding to a > and /? < 1. The bottom line c = 
corresponds to (3 = a, a case where there is no information in the reported 
result. The right boundary corresponds to a = 0, and the left boundary to 
= 1. 

When the two error probabilities are equal, a = 1 — j3, we get the mixed 
effects. This may be interpreted as an agreement with a general property of 
inference problems under symmetry (see Chapter 3 in [9J: When a prior is 
chosen as an invariant measure of a transitive group, then Bayesian credibil- 
ity statements and confidence statements (corresponding to hypothesis testing 
problems) are equivalent. In the present case the prior in question is just the 
probability 1/2 on each of the values -1 and +1. And there is only one error 
probability to report, in the hypothesis testing case as above, in the Bayesian 
case the posterior probability corresponding to the result observed. With a 
symmetrical prior we also have P(t b = +l\X b = +1) = P{X b = +l\t b = +1) 
and so on in this case, so that everything that is done in this subsection for this 
case is consistent with what was done in the previous subsection. 

And of course, when a = and (3 = 1, both error probabilities are 0, and 
we get the pure states on the Bloch sphere. 

A simple hypothesis problem can be inverted by exchanging hypothesis and 
alternative. Assume also that the reported result is opposite of what we had 
above, that is, -1 instead of +1. As on the Bloch sphere, this corresponds to an 
operator with the sign of c inverted, but here also with the two error probabilities 
exchanged. Explicitely, the operator will be 

E 2 (r, c, b) = i((2 -r)I-cb-a) = I-E, (23) 
that is, E + E 2 = I, just as ViV± + v 2 v 2 ^ = I in the Bloch sphere case. 
12.4 Generalized probability. 

What is left now, is to interprete the generalized probabilities assumed to be 
associated with the effects in the proof of the Born formulae, and to motivate 
the additivity of these generalized probabitities. As there, we assume that we 
start with a pure state a, and consider after that the generalized probability 
n(E) associated with an effect E = E(r, c, u) = E(a, (3, u) as introduced above. 
The crucial point is that these generalized probabilities must be assumed to 
exist, here as a result of some symmetry considerations. 

In [5], and as used in the above proof of Born's formula, the basic assump- 
tions about the generalized probabilities is that Ei + E 2 + ■ ■ ■ < I should imply 
tt(Ei + E 2 + ■ ■ •) = tv(Ei) + tt(E 2 ) + . . .. My intention is to indicate a proof of 
the above property from the general assumptions of this paper. To this end, I 
will need in addition to the previous Axioms a basic symmetry assumption, as 
expressed below. This new Axiom states in effect that, when initial and final 
states are transformed with the same group transformation, then the transition 
probability remains the same. 
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Recall the group K defined in Axiom 6. 

Axiom 9. When X ai — X a k and X hl = X b k with the same group element 
k £ K , then 

P(A fcl = Ajfc|A ai = X t k) = P(X b = Xj\X a = AO- 



Theorem 8. From the hypothesis testing interpretation and our general 
assumptions we have that 

Y^Ei^ 1 im P lies t£ E i) = D^)- 



Proof: I will carry out the proof in two steps. Let 
E = E(r, c,u) = —(r + cu-cr), where \\u\\ = 1, < c < f , and c < r < 2 — c. 

(i) For fixed u we have that 7r(^ r^, Cj, u) = 7r(rj, Cj, u). 
In the hypothesis testing interpretation 

r = 2-a-/3 = + = P(t = +1|A = +1)+P(t = +1|A = -I), (24) 

where we have dropped the superindex a. Similarly 

c = - a = (1 - a) - (1 - (3) = P(t = +1|A = +1) - P(t = +1|A = —I) (25) 

Imagine now the following three experimental setups with the same experi- 
mental direction a: 

(1) A measurement apparatus (I), whose outcome may be formulated as a 
hypothesis testing problem with level ax, power /3± and test variable t\. 

(2) A similar measurement apparatus (2) for the same system with charac- 
teristica 02^2^2- 

For these two cases we find (fj, Cj) (i — 1, 2) from (|24Jl and l|25|l . 

(3) A symmetric coin is tossed to choose between the experiments (1) and 

(2). 

The generalized probability n for this case is the probability of some state, 
specified by the question given by r, c and b, or by a, (3 and b, and the conclusion 
that the ideal answer is u = b or A b = +1, i.e., the measurement gives t b = +1. 
Now from l|24(l and (|25|l we conclude that the (r, c)-value for experiment (3) is 
|(rx + 7*2), -|(ci + C2). Thus we conclude that 

^(ttOi +ra), i(ci +c 2 ),u) = ~n(n,ci,u) + ~Tr(r2,c 2> u). 
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Furthermore, by considering suitable changes in the level and power of the 
test, again using (|2^|) and lj2~5|) . we see that 7r(|r, |c, u) = ^n(r, c, u) if both sets 
of values (ir, \c) and (r, c) are within the triangle defining effects. Therefore 
we conclude that C\ + c 2 > and Ci + c 2 < 7"i + r 2 < 2 — ci — c 2 implies 

7r(ri + r 2 ,ci + c 2 ,u) = 7r(ri,ci,u) + 7r(r 2 , c 2 , u). 

The extension to several basic experiments is straightforward. 
(ii) The general case. 

Use the same setup as in (i), but assume now that the experiments (1) and 
(2) are measurements in different directions b± and 6 2 , that different test levels 
and test powers may be used in the decision process, but that we are using 
physical masurcmcnt apparata with identical display, showing the result t = +1 
or t = —1. Assume that the rotation from b\ to & 2 is defined in a specific way, 
so that the two directions are in correspondence relative to the masurement 
display. Again let experiment (3) consist of tossing a symmetric coin to choose 
either experiment (1) or experiment (2). 

Let A be the 'real' spin component in each of the three experiments, that is 
in the direction of the chosen measurement display. Let tj (i = 1, 2, 3) be be the 
observed spin component in experiment i. Then from the situation we have 

P(t 3 = +1|A = +1) = ip(ix = +1|A = +1) + ip(i 2 = +1|A = +1), (26) 

with a similar identity for A = — 1, and, according to 124J1 . this implies 

r 3 = \(ri+r 2 ), (27) 

By the symmetry Axiom 8, the probability distributions of the outcomes of 
experiments (1) and (2) will be the same if we imagine that they have the same 
final vector b\, but that the initial vector of the experiment (2) was rotated in 
the opposite direction. Seen in this way, all three experiments have the same 
final vector, but different initial conditions. This implies that we can use an 
argument similar to the one used in part (i) also here, if we take the different 
initial conditions into account. In effect, the conclusion from this is that we use 
the argument that the generalized probability for experiment (3) must be the 
mean of the generalized probabilities for the experiments (1) and (2). Having 
agreed upon this, we return to the original setup with different final vectors and 
with formulae for effects depending upon these final vectors. 

From this argument we also conclude that it is natural to associate an effect 
also to the artificial experiment (3) . Note that this experiment can be arranged 
in such a way that the person reading the measurement display does not know 
that the experiment is artificial, i.e., that a coin has been tossed to choose 
between experiments (1) and (2). Since the constants r, c and u only enter the 
effect E in the combinations r and cit, we only have to define new values for 
these quantities for the experiment (3). 
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Consider the expected observed spin component in experiment (1). If the 
real parameter A is +1, this expectation will be 

P(h = ljA - - (1 - P(h = l\\ = 

= -bi + 2P(h = 1|A = l)6i = -ui + 2P(h = 1|A = (28) 

The similar expression for A = — 1 will be u\ — 2P(t\ = +1|A = —l)u\. The 
sum of these two expectations will be 

2P{ti = 1|A = +l)ui - 2P{t x = 1|A = -l)u x = 2c lUl 

by 125p . which gives an iterpretation of the vector c\U\ occuring in the effect 
E\. There is of course a similar expression for experiment (2), and since the 
expectation of the outcome of experiment (3) must be the mean of the outcomes 
of experiments (1) and (2), this experiments must be assigned an effect such that 

C3W3 = ^(ciui + c 2 u 2 ). (29) 

Thus it follows from the argument in the previous paragraph: 

7r(^(n +r 2 ),-(ciui + c 2 u 2 ) = i(7r(ri,ciui) + 7r(r 2 , c 2 u 2 )). (30) 

Then we can adjust error probabilities in each of the experiments to get rid 
of the factor 1/2 in the same way as in the previous case. This shows that in 
general 

tt{E 1 +E 2 ) = tt{E 1 )+tt(E 2 ) (31) 
whenever the lcfthand side is defined. 

Some specific cases are easily verified. Begin with the case E + E 2 = I, 
as in (|23|l . and let tt(E) = ir(r,c,b) — ir(a,(3,b) have some value. Then we 
must have that ir(E 2 ) = ir(2 — r, — c, b) = 1 — 7r(r, c, b). In particular, for mixed 
states 7r(l,— c, b) = 1 — 7r(l,c, 6), and as a particular case again, the totally 
uninformative state has 7r(i/) = i. Furthermore, 7r(0) = and ir(I) = 1. 

To find explicit formulae for the generalized probability tt(E), we first look at 
the fact, proved by Caves et al. jS] from the additivity property now motivated 
by Theorem 8, that tt(E) is linear in E, that is, for fixed b, linear in r and 
c. Fixing n(b) — 7r(l, 1,6) for the corresponding pure state, and taking into 
account the values found in the previous paragraph, this implies that 

7r(r,c,6) = K + (tt(6) - ^)c. 

In terms of the level and the power of the test, this is 

ir(a, (3, b) = 1 - aTr(b) - /3(1 - tt(6)). 
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As shown in the proof of Born's formula in the previous Section, the gener- 
alized probability n(b) takes the form 7r(6) = h(l + a ■ b), where a is the initial 
pure state. This gives finally 

7r(r,c,b) = ~(r + ca-b), (32) 

or 

n(a, 0,b) = l- ~(a + 0) + i(/3 - a)a ■ b. (33) 

12.5 Probability from symmetry between largescale ex- 
periments. 

Four drugs A, B, C and D are being compared with respect to the expected 
recovery time fx they induce on patients with a certain disease. There are 
relatively few patients available, so one concentrates on getting information on 
the sign of the difference between each /i and the mean of the others, for instance 

X A = sign(^4 - -(n B + /i C + n D )). 

We will not go into detail with the experimental design here, but assume that 
there is an efficient design, say, of an incomplete block type, where accurate 
information can be obtained about one or a few such A's. 

Just for this illustration assume that we from some experiment have obtained 
very accurate information that X A = +1, and this is the only information we 
have. Then we want to perform a new experiment with level a and power (3 in 
order to test an hypothesis that X B also is +1. Can we get any prior information 
about the result of this from the first experimental result? Informally, since [ia 
is subtracted in the expression for A B , we should expect a probability less than 
1/2 that X B = +1. 

In this ideal case we can get accurate information from the theory above. 
There is at the outset complete symmetry between the four binary parameters 
X A , X B , X c and X D . Permutational symmetry can always be imagined as imbed- 
ded in some rotational symmetry. Here we can imagine rotation in 3-space, since 
we can consider a regular tetrahedron in this space. The perpendiculars from 
the corners A, B, C and D of that tetrahedron to the opposite side can then be 
taken to represent the parameters X A , A B , X c and X D , respectively. From what 
is known about regular tetrahedrons, the angle between two perpendiculars is 
approximately 109°, with a cosine equal to -1/3. 

At least tentatively, this can be taken as a special case of the theory above. 
Thus from (J2SJ), the prior probability of obtaining the result +1 in the last 
experiment is 

l-i(a + /3)-i(/3-a) = l-ia-|g. (34) 

For the special case of an ideal experiment with a = 0, j3 = 1, the probability 
is just 1/3. 
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It seems likely that this probability assessment also can be calculated in 
other ways. However, to us the very important point is that the calculation 
here used the following two steps: 

1) Using symmetry arguments, the probability can be assumed to exist. 

2) Then the formula for the probability follows from the above proof of 
Born's formula. 

In the case of the above example, we had also the crucial fact that, even 
though the parameter space had more dimensions, the symmetry could be re- 
duced to a symmetry in 3-space. On this space, the probability space on this 
space and the resulting L 2 space as a Hilbert space, the above arguments for 
the Born formula applies. And this gives a definite formula for the probability. 

In my view, the probabilities in quantum mechanics can be argued for in a 
similar way using a complex Hilbert space. Crucial points in the argument are 
first the construction of a meaningful Hilbert space which is a proper subspace of 
the L 2 space over the full (total) parameter space, then the result (completed in 
Lemma 2 above) that the vectors of that Hilbert space stand in one-to-one cor- 
respondence with the events related to the parameters of different experiments 
that we are interested in. 

This is a parallel to the argument in the above example. And my feeling is 
that a similar argument can be used in several other decision problems involving 
inference from one macroscopic experiment to other experiments. But this is 
outside the scope of the present paper. 

12.6 Further issues concerning decision problems. 

In the Example of the previous subsection, parameters taking only two values 
were considered. What is new in this example from a decision theory point of 
view, is that we include several potential questions about the same system in 
the consideration. 

More complicated decision problems may perhaps at least to some extent be 
tackled by looking at the dichotomy A 6 = X b k versus A 6 ^ \\. Or for those who 
know some hypothesis testing theory: What has been treated in this section, 
has been just the simple Neyman-Pearson case, but this is the basis for the 
whole statistical theory of hypothesis testing. 

In any case, it is satisfying that a discussion of decision problems related 
to statistical experiments can be coupled to basic issues related to a physical 
theory like quantum mechanics. This may point at a new unification of scientific 
methods. 

13 Basic formulae of quantum mechanics. 

Having proved Born's formula, much of the results of standard quantum me- 
chanics follow. The results of this Section are not claimed to be new of course; 
the point is that they can be derived from the above formulation. 
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Our state concept may be summarized as follows: To the state A a (-) = 
there corresponds the state vector v% , and these vectors determine the transition 
probabilities for perfect experiments as in (|17fl . 

Theorem 9. (i) E(A b |A a = X%) = v a k ] T b vl, where T b = J2tf v j v f- 
(ii) E(/(A")|A a = \%) = vff(T b )v a k , where f(T b ) = £/(A$)^ 6t . 

Proof: Straightforward application of Theorem 7. 

Thus, in ordinary quantum mechanical terms, the expectation of every ob- 
servable in any state is given by the familiar formula. 

It follows from Theorem 9(i) and from the preceding discussion that the first 
three rules of Isham as cited in the Introduction above, taken there as a 
basis for quantum mechanics, are satisfied. 

Now turn to non-perfect experiments. In ordinary statistics, a measurement 
is a probability measure Q e (dy) depending upon a parameter 8. Assume now 
that such a measurement depends upon the parameter X b , while the current 
state is given by X a = X k . Then as in Theorem 9 (ii) 

Theorem 10. (i) Corresponding to the experiment 6 € A one can define 
an operatorvalued measure M by M(dy) — ^2jQ* j (dy)v b v^. Then, given the 
initial state X a — X k , the probability distribution of the result of experiment b is 
given by P[dy\X a = X%] = v^M{dy)vl- 

(ii) These operators satisfy M[S] = I for the whole sample space S, and 
furthermore ^2M(Ai) = M(A) for any finite or countable sequence of disjoint 
elements {Ai,A2, . . .} with A — UjAj. 

Theorem 10(ii) is easily checked directly. 

A more general state assumption is a Bayesian one corresponding to this 
setting. From Theorem 10(ii) we easily find: 

Theorem 11. Let the current state be given by probabilities tt(A^) for differ- 
ent values of X k . Then, defining p = Y1 7T (^k) v k v 'k > we 9 e ^ P[dy] — tr[pM (dy)]. 

These results are the basis for much of quantum theory, in particular for the 
quantum statistical inference in Barndorff-Nielsen et al. PQ; for a formulation, 
see also Isham \12\ . 

Note that the density matrix v^v^ is equivalent to the pure state v k ; simi- 
larly, a density matrix v b v b ^ is equivalent to the statement that an ideal mea- 
surement giving A b = X b just has been peformed. By straightforward application 
of Born's formula one gets 

Theorem 12. Assume an initial state v k , and assume that an ideal mea- 
surement of X b has been performed without knowing that value. Then this state 
is described by a density matrix . 
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This is the celebrated and much discussed projection postulate of von Neu- 
mann. Writing Pj = VjVj and p — v%v^ here, the j'th term in the last formula 
can be written PjpPj, which is a special case of the Dirac-von Neumann formula 

E3 

Thus the so-called von Neumann measurements cab be viewed as ordinary 
statistical measurements in our modelling approach. 

In general we have assumed for simplicity in this Section that the state 
vectors are nondegenerate eigenvectors of the corresponding operators, meaning 
that the parameter A a contains all relevant information about the system. This 
can be generalized, however. 

14 Conclusion. 

Although this paper contains fairly much substance, it could also be regarded 
as a starting point of a wider and more complete theory. Already now it goes 
pretty far towards giving a foundation for treating continuous variables like 
positions and momenta. Also, a theory for composite systems follows easily. 
A very natural task would be to complete the extension of the theory to non- 
perfect experiments in order to treat real measurement apparata. Several other 
extensions can be imagined. 

An important future task will be to try to discuss entanglement under this 
umbrella. It seems clear that, even if it should be possible to define some sort 
of total parameter for this case, no posterior probability could ever be defined 
on this total parameter. Such probabilities should be confined to results of 
experiments, and, as pointed out by Hess and Philipp JT] and others, even the 
existence of joint probabilities connected to pairs of experiments do not imply 
that probabilities on the full parameter space exist. 

A general first goal for a theory like this could be to satisfy the requirements 
by Volovich but we can also imagine developments beyond that. But I re- 
peat my most important conclusion: The basis sketched here can be interpreted 
in a non-formal way, so it can be understood intuitively; furthermore, it can 
be related to ordinary statistical thinking, and thus has a link to methodology 
used in other empirical sciences. 

There are several other approaches to quantum mechanics in the literature, 
but the only other really non- formal approach that I know about, is the one by 
Hardy who defines a state as a list of probability distributions associated 
with every outcome of every possible experiment. There should of course be 
relationships between these different ways of viewing the world, but to discuss 
that, is beyond the scope of the present paper. 
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